18 research outputs found

    Algorithmic Analysis of Complex Audio Scenes

    Get PDF
    In this thesis, we examine the problem of algorithmic analysis of complex audio scenes with a special emphasis on natural audio scenes. One of the driving goals behind this work is to develop tools for monitoring the presence of animals in areas of interest based on their vocalisations. This task, which often occurs in the evaluation of nature conservation measures, leads to a number of subproblems in audio scene analysis. In order to develop and evaluate pattern recognition algorithms for animal sounds, a representative collection of such sounds is necessary. Building such a collection is beyond the scope of a single researcher and we therefore use data from the Animal Sound Archive of the Humboldt University of Berlin. Although a large portion of well annotated recordings from this archive has been available in digital form, little infrastructure for searching and sharing this data has been available. We describe a distributed infrastructure for searching, sharing and annotating animal sound collections collaboratively, which we have developed in this context. Although searching animal sound databases by metadata gives good results for many applications, annotating all occurences of a specific sound is beyond the scope of human annotators. Moreover, finding similar vocalisations to that of an example is not feasible by using only metadata. We therefore propose an algorithm for content-based similarity search in animal sound databases. Based on principles of image processing, we develop suitable features for the description of animal sounds. We enhance a concept for content-based multimedia retrieval by a ranking scheme which makes it an efficient tool for similarity search. One of the main sources of complexity in natural audio scenes, and the most difficult problem for pattern recognition, is the large number of sound sources which are active at the same time. We therefore examine methods for source separation based on microphone arrays. In particular, we propose an algorithm for the extraction of simpler components from complex audio scenes based on a sound complexity measure. Finally, we introduce pattern recognition algorithms for the vocalisations of a number of bird species. Some of these species are interesting for reasons of nature conservation, while one of the species serves as a prototype for song birds with strongly structured songs.Algorithmische Analyse Komplexer Audioszenen In dieser Arbeit untersuchen wir das Problem der Analyse komplexer Audioszenen mit besonderem Augenmerk auf natĂŒrliche Audioszenen. Eine der treibenden Zielsetzungen hinter dieser Arbeit ist es Werkzeuge zu entwickeln, die es erlauben ein auf LautĂ€ußerungen basierendes Monitoring von Tierarten in Zielregionen durchzufĂŒhren. Diese Aufgabenstellung, die hĂ€ufig in der Evaluation von Naturschutzmaßnahmen auftritt, fĂŒhrt zu einer Anzahl von Unterproblemen innerhalb der Audioszenen-Analyse. Eine wichtige Voraussetzung um Mustererkennungs-Algorithmen fĂŒr Tierstimmen entwickeln zu können, ist die VerfĂŒgbarkeit großer Sammlungen von Aufnahmen von Tierstimmen. Eine solche Sammlung aufzubauen liegt jenseits der Möglichkeiten eines einzelnen Forschers und wir verwenden daher Daten des Tierstimmenarchivs der Humboldt UniversitĂ€t Berlin. Obwohl eine große Anzahl gut annotierter Aufnahmen in diesem Archiv in digitaler Form vorlagen, gab es nur wenig unterstĂŒtzende Infrastruktur um diese Daten durchsuchen und verteilen zu können. Wir beschreiben eine verteilte Infrastruktur, mit deren Hilfe es möglich ist Tierstimmen-Sammlungen zu durchsuchen, sowie gemeinsam zu verwenden und zu annotieren, die wir in diesem Kontext entwickelt haben. Obwohl das Durchsuchen von Tierstimmen-Datenbank anhand von Metadaten fĂŒr viele Anwendungen gute Ergebnisse liefert, liegt es jenseits der Möglichkeiten menschlicher Annotatoren alle Vorkommen eines bestimmten GerĂ€uschs zu annotieren. DarĂŒber hinaus ist es nicht möglich einem Beispiel Ă€hnlich klingende GerĂ€usche nur anhand von Metadaten zu finden. Deshalb schlagen wir einen Algorithmus zur inhaltsbasierten Ähnlichkeitssuche in Tierstimmen-Datenbanken vor. Ausgehend von Methoden der Bildverarbeitung entwickeln wir geeignete Merkmale fĂŒr die Beschreibung von Tierstimmen. Wir erweitern ein Konzept zur inhaltsbasierten Multimedia-Suche um ein Ranking-Schema, dass dieses zu einem effizienten Werkzeug fĂŒr die Ähnlichkeitssuche macht. Eine der grundlegenden Quellen von KomplexitĂ€t in natĂŒrlichen Audioszenen, und das schwierigste Problem fĂŒr die Mustererkennung, stellt die hohe Anzahl gleichzeitig aktiver GerĂ€uschquellen dar. Deshalb untersuchen wir Methoden zur Quellentrennung, die auf Mikrofon-Arrays basieren. Insbesondere schlagen wir einen Algorithmus zur Extraktion einfacherer Komponenten aus komplexen Audioszenen vor, der auf einem Maß fĂŒr die KomplexitĂ€t von Audioaufnahmen beruht. Schließlich fĂŒhren wir Mustererkennungs-Algorithmen fĂŒr die LautĂ€ußerungen einer Reihe von Vogelarten ein. Einige dieser Arten sind aus GrĂŒnden des Naturschutzes interessant, wĂ€hrend eine Art als Prototyp fĂŒr Singvögel mit stark strukturierten GesĂ€ngen dient

    A covering problem that is easy for trees but NP-complete for trivalent graphs

    Get PDF
    AbstractBy definition, a P2-graph Γ is an undirected graph in which every vertex is contained in a path of length two. For such a graph, pc(Γ) denotes the minimum number of paths of length two that cover all n vertices of Γ. We prove that ⌈n/3⌉≀pc(Γ)≀⌊n/2⌋ and show that these upper and lower bounds are tight. Furthermore we show that every connected P2-graph Γ contains a spanning tree T such that pc(Γ)=pc(T). We present a linear time algorithm that produces optimal 2-path covers for trees. This is contrasted by the result that the decision problem pc(Γ)=?n/3 is NP-complete for trivalent graphs. This graph theoretical problem originates from the task of searching a large database of biological molecules such as the Protein Data Bank (PDB) by content

    Methods for the automatic recording of bird calls and songs in field ornithology

    Get PDF
    Der gegenwĂ€rtige Kenntnisstand ĂŒber automatisierte Methoden zur akustischen Erfassung von Rufen und GesĂ€ngen von Vögeln wird dargelegt. Die Grundlage fĂŒr eine automatisierte Erfassung bilden Langzeitaufzeichnungen. Es wird der Frage nachgegangen, inwiefern Tonaufzeichnungen fĂŒr eine qualitative und auch quantitative Analyse von VogelbestĂ€nden geeignet sind. Spezielles Augenmerk wird autonomen Aufzeichnungsmethoden und der Auswertung von Langzeitaufzeichnungen unter Nutzung von Algorithmen der akustischen Mustererkennung gewidmet. Sinnvolle Einsatzszenarien fĂŒr automatisierte Methoden im Rahmen avifaunistischer Feldforschung sind die Erfassung des nĂ€chtlichen Vogelzuges, die Erfassung nachtaktiver Brutvogelarten und die Datenerhebung in Kernzonen von Schutzgebieten.This review presents our current knowledge on automated methods for acoustic recording of calls and songs of birds. Acoustic long-term recordings can serve as a basis for an automated bird census. We stress the question of whether sound recordings are suitable for qualitative and quantitative analysis of bird populations. Special attention is devoted to autonomous recording methods and the evaluation of long-term recordings by use of acoustic pattern recognition algorithms. Realistic scenarios for the use of automated methods in field ornithology we see in the investigation of nocturnal bird migration, the census of nocturnal bird species, and data collection in core areas of nature reserves

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Robust Identification of Time-Scaled Audio

    No full text
    Automatic identification of audio titles on radio broadcasts is a first step towards automatic annotation of radio programmes. Systems designed for the purpose of identification have to deal with a variety of postprocessing potentially imposed on audio material at the radio stations. One of the more difficult techniques to be handled is time-scaling, i.e., the variation of playback speed. In this paper we propose a robust fingerprinting technique designed for the identification of time-scaled audio data. To allow for fast timescale invariant audio identification, the extracted fingerprints are used as an input to an algebraic indexing technique that has already been successfully applied to the task of audio identification

    Automatic sentence boundary detection for German broadcast news

    No full text
    In this work we aim at enriching the transcript of an automatic speech recognition system with punctuation by automatically detecting sentence ends. We make use of a simple word-based language model and combine it with a decision tree for the acoustic features of speech. The focus lies on selecting robust acoustic features that reflect the prosodic characteristics of the German language in a most optimal way. We arrive at a Sentence Unit Error Rate of 54 compared to the state-of-the art rate for English of 61, by applying a comparable detection system. This is a sound indication that prosody has a stronger cue on perception of sentence boundaries for German than for English. Our work is, to our knowledge, the first system developed for sentence boundary detection for the broadcast news dom ain for German language. Our results can therefore serve as a baseline for further studies in this scenario

    DiSCo - A speaker and speech recognition evaluation corpus for challenging problems in the broadcast domain

    Get PDF
    Baum D, Samlowski B, Winkler T, Bardeli R, Schneider D. DiSCo - A speaker and speech recognition evaluation corpus for challenging problems in the broadcast domain. In: GSCL Symposium Sprachtechnologie und EHumanities. 2009: 1-9.Systems for speech and speaker recognition already achieve low error rates when applied to high-quality audiovisual broadcast data, such as news shows recorded in a studio environment. Several evaluation corpora exist for this domain in various languages. However, in actual applications for broadcast data analysis, the data requirements are more complex. There are many data types beyond the planned speech of the news anchorperson. For example, interesting live recordings from prominent politicians are often recorded in an environment with challenging acoustic properties. Discussions typically expose highly spontaneous speech, with different speakers talking at the same time. The performance of standard approaches to speech and speaker recognition typically deteriorates under such data characteristics, and dedicated techniques have to be developed to handle these problems. Corresponding evaluation corpora are needed which reflect the challenging conditions of the actual applications. Currently, no German evaluation corpus is available which covers the required acoustic conditions and diverse language properties. This contribution describes the design of a new speaker and speech recognition evaluation corpus for the broadcast domain, reflecting the typical problems encountered in actual applications

    Speech recognition as a retrieval problem

    No full text
    Common approaches to automatic speech recognition (ASR) are based on training statistical models for the acoustics of speech. In our work, a retrieval-based ASR system is developed that does not rely on training and thus provides more flexible application. It is based on a set of known reference word utterances for each possibly occurring word in a test string. A test word string is identified by finding the most similar reference for each word by using an approach based on dynamic time warping (DTW). The DTW variant suitable for recognizing strings of connected words is called level-building DTW, proposed by Myers and Rabiner in 1981. It is using a level-wise iteration to match each word in the test utterance with the most similar reference. In our work, an ASR system for connected digit recognition based on level-building DTW is developed, evaluated and compared with a state-of-the-art HMM recognizer
    corecore